Optimizing Gaussian process regression (GPR) hyperparameters with three metaheuristic algorithms for viscosity prediction of suspensions containing microencapsulated PCMs

Suspensions containing microencapsulated phase change materials (MPCMs) play a crucial role in thermal energy storage (TES) systems and have applications in building materials, textiles, and cooling systems. This study focuses on accurately predicting the dynamic viscosity, a critical thermophysical property, of suspensions containing MPCMs and MXene particles using Gaussian process regression (GPR). Twelve hyperparameters (HPs) of GPR are analyzed separately and classified into three groups based on their importance. Three metaheuristic algorithms, namely genetic algorithm (GA), particle swarm optimization (PSO), and marine predators algorithm (MPA), are employed to optimize HPs. Optimizing the four most significant hyperparameters (covariance function, basis function, standardization, and sigma) within the first group using any of the three metaheuristic algorithms resulted in excellent outcomes. All algorithms achieved a reasonable R-value (0.9983), demonstrating their effectiveness in this context. The second group explored the impact of including additional, moderate-significant HPs, such as the fit method, predict method and optimizer. While the resulting models showed some improvement over the first group, the PSO-based model within this group exhibited the most noteworthy enhancement, achieving a higher R-value (0.99834). Finally, the third group was analyzed to examine the potential interactions between all twelve HPs. This comprehensive approach, employing the GA, yielded an optimized GPR model with the highest level of target compliance, reflected by an impressive R-value of 0.999224. The developed models are a cost-effective and efficient solution to reduce laboratory costs for various systems, from TES to thermal management.

www.nature.com/scientificreports/ 25 to 50 °C.An ANN was employed to estimate the TC and DV accurately.Notably, a neural network architecture with a 2-4-4-2 structure demonstrated exceptional predictive performance, closely aligning with the experimental results.Marani et al. 26 proposed a deep learning approach for the modeling of hydration in cementitious systems incorporating MPCMs.Their study aimed to determine the apparent activation energy associated with the hydration process.The findings of their study demonstrated the superior predictive performance of the deep neural network over the gradient-boosting ensemble method in forecasting the cumulative heat and hydration rate in cementitious systems.In another work, Marani et al. 27 introduced a mixed ML-based procedure for concrete incorporating MPCMs.They employed a tabular generative adversarial network (TGAN) to generate synthetic mixture design data and developed robust predictive ML models.By enhancing the TGAN-GBR model with the particle swarm optimization (PSO), they optimized the mixture design of concrete and mortar with various MPCMs.Tanyildizi et al. 28 proposed a deep-learning methodology to predict the compressive strength of cementitious composites integrated with MPCMs.They developed purposeful models using experimental datasets, including XGBoost and deep learning.The models effectively predicted compressive strength with low error and identified influential parameters for mixture design, aiding concrete development incorporating MPCM.
Based on reviewed research, it is clear that MPCM-based suspensions hold immense potential for various applications.However, the prediction of their TPPs has been overlooked by researchers, creating a significant research gap that needs to be addressed.On the other hand, machine learning methods like Gaussian process regression (GPR) offer great power, but their effectiveness hinges on carefully chosen hyperparameters (HPs).Existing research lacks extensive sensitivity analysis on these hyperparameters.The authors of this study have directed their attention toward addressing these two concerns by focusing on the precise GPR-based prediction of dynamic viscosity in MPCM suspensions, which play a vital role in the design of energy storage systems.For this purpose, the present research introduces a comprehensive three-step procedure for identifying the crucial hyperparameters in GPR.It employs three metaheuristic algorithms (marine predators algorithm, particle swarm optimization, and genetic algorithm) for each step, offering a novel framework for optimizing hyperparameter selection in complex machine learning methods with numerous hyperparameters.The models developed in this research offer engineers a profound understanding of the dynamic viscosity of pure and nano-enhanced MPCM suspensions.The proposed novel framework offers a tool for diagnosing and optimizing HPs in ML methods involving several hyperparameters.These advancements have significant implications for enhancing efficiency across various industries, ranging from TES systems to thermal management mechanisms.Moreover, the developed models substantially decrease the expenses of experimental research and computational simulations, rendering them efficient and cost-effective solutions.Figure 1 presents a concise overview of the modeling process employed in the present study.Following the discussion of the research background and motivation in Section "Introduction", Section "Data analysis" will delve into the datasets employed in the study and the methods used for data analysis.Section "Methodology" will then review the research methodology, including the theoretical foundation of the metaheuristic algorithms and the machine learning method.Section "Models development" will focus on the model development strategy, presenting the procedure flowchart and providing specific details regarding the implementation of the modeling process.To ensure a clear evaluation of the models' performance, Section "Evaluation criteria" will describe the statistical criteria and parameters.Section "Results and discussion" will then present the overall and detailed evaluations of the developed models, including a comparative analysis to highlight subtle differences.Finally, Section "Conclusion" will conclude the paper by summarizing the key findings of the research.

Data analysis
In this study, the experimental data of Jin et al. 29 are used to model the dynamic viscosity of the suspension containing MPCM.To achieve optimal dispersion stability, the researchers investigated different proportions of isopropyl alcohol (IPA) to water as base fluid.They determined that a specific mass ratio (42:58) yields the best MPCM-based suspension stability.They further examined the effects of MPCM concentrations (5-15 wt%) and MXene concentrations (0.01-0.5 wt%) on improving the thermophysical properties.The suspensions underwent rigorous testing within a temperature range of 20 to 60 °C to assess their performance.
Table 1 shows the statistical specifications of the laboratory datasets.The statistical analysis of the provided data reveals valuable insights into each parameter's distribution, central tendency, variability, and shape 30 .Regarding temperature (°C), the data shows a relatively symmetrical distribution with a mean and median of 40 °C.The temperature values range from 20 °C to 60 °C, indicating a moderate spread.The standard deviation of 14.302 suggests a considerable variability in temperature measurements.The skewness of 0 indicates a lack of significant skew, while the negative kurtosis of − 1.311 implies a slightly flatter distribution compared to a normal distribution.The Kolmogorov-Smirnov statistic of 0.158 suggests a relatively small deviation from a normal distribution.
For MPCM MF (wt%), the data displays a left-skewed distribution with a mean of 8.889% and a median of 10%.The weight percentages range from 0 to 15%, indicating a moderate spread.The standard deviation of 3.973 reflects a notable variability in MPCM weight measurements.The positive kurtosis of 0.902 demonstrates a relatively peaked distribution, while the negative skewness of − 0.995 suggests a slight asymmetry towards lower values.The Kolmogorov-Smirnov statistic of 0.388 implies a moderate deviation from a normal distribution.
In the case of MXene MF (wt%), the data exhibits a right-skewed distribution with a mean of 0.1067% and a median of 0.01%.The weight percentages vary between 0% and 0.5%, indicating a relatively narrow spread.The standard deviation of 0.1686 suggests a relatively low variability in MXene weight measurements.The positive skewness of 1.541 indicates a clear asymmetry towards higher values, while the positive kurtosis of 0.965 suggests a relatively peaked distribution.The Kolmogorov-Smirnov statistic of 0.298 implies a moderate deviation from a normal distribution.
Concerning dynamic viscosity (mPa•s), the data demonstrates a right-skewed distribution with a mean of 4.2101 mPa•s and a median of 3.2449 mPa•s.The dynamic viscosity values range from 1.0080 to 16.1990 mPa•s, indicating a widespread.The standard deviation of 2.8577 reflects a moderate variability in DV measurements.The positive skewness of 2.036 suggests a clear asymmetry towards higher values, while the positive kurtosis of 5.755 demonstrates a relatively peaked distribution.The Kolmogorov-Smirnov statistic of 0.31 implies a moderate deviation from a normal distribution.
Figure 2 shows the frequency histogram of all parameters.Visualizing the data through histograms can provide a clearer understanding of each variable's distribution patterns and frequency of measurements.The graphic representation of Fig. 2 for different parameters is entirely consistent with the findings of Table 1.In addition, utilizing violin plots can facilitate a more profound comprehension and enhanced insight into the data.As shown in Fig. 3, they combine a box plot and kernel density plot, allowing for the simultaneous display of distribution shape, central tendency, and variability.Violin plots also enable easy comparison between multiple groups or categories and can reveal multimodal or skewed distributions.This visual confirmation can complement the statistical analysis in Table 1 and help draw more robust conclusions from the data.In order to integrate all parameters into a single violin graph, data normalization is employed as follows: www.nature.com/scientificreports/ The Pearson Correlation Coefficient (PCC) assesses the quantitative and qualitative relationship between dynamic viscosity and independent variables 31 .PCC is a statistical measure that assesses the linear relationship between two variables.It quantifies the strength and direction of the association, ranging from − 1 to 1.The PCC's objective is to identify and quantify the extent of linear association and facilitate data analysis.PCC is quantified according to the following relationship 31,32 : (1) www.nature.com/scientificreports/ Figure 4 shows the PCC values between different variables.According to the correlogram, dynamic viscosity is moderately negatively correlated with temperature (− 0.45), strongly positively correlated with MPCM MF (0.66), and negligibly correlated with MXene MF (0.01).This means that as temperature increases, viscosity tends to decrease.Higher MPCM MF values are associated with higher viscosity, while no significant linear relationship exists between viscosity and MXene MF.These correlations provide valuable insights into the interdependencies between these variables, aiding in understanding their behavior and potential impact on each other.While PCC values provide insights into the linear relationship between variables, it is crucial to acknowledge that there could be a significant non-linear relationship between them 33 .

Methodology
This research uses the GPR method to model dynamic viscosity in terms of independent variables.The genetic algorithm (GA), marine predators algorithm (MPA), and particle swarm optimization (PSO) algorithm are used to optimize the HPs of the GPR model.The rest of this section introduces the theory of modeling and optimization algorithms in detail.

Gaussian process regression
GPR is a potent technique in ML that offers a flexible and robust approach for modeling and predicting data 23 .It treats functions as random variables and assumes that any finite set of variables follows a joint Gaussian distribution (JGD).This characteristic enables GPR to capture intricate relationships and provide uncertainty estimates.The core process involves specifying a prior belief about the underlying function, typically assumed to follow a Gaussian distribution (GD) 34 .This prior belief is updated by incorporating observed data to obtain a posterior distribution, representing an improved understanding of the function given the data.This posterior distribution allows predictions to be made at new, unseen points in the dataset.What sets GPR apart is its nonparametric nature, as it does not require making assumptions about a fixed parametric form for the underlying function.This adaptability enables it to model various functions, including those with highly non-linear relationships.
Consider a set of data points, x i , y i ; i = 1,2, . . ., n , where x i ∈ R d and y i ∈ R d .It is assumed that these data points originate from an unknown distribution.When data points are utilized for training in the Gaussian process regression, a predictive model is constructed.This model aims to estimate the output vector ( y new ) corresponding to a given input vector ( x new ) 23,35 : The term " β " is calculated in the training process.The term " ε " represents an error or noise term in the model: The estimation of error variance ( σ 2 ) is derived from the available data.In Gaussian process regression, the prediction of outputs relies on the utilization of the basis functions (BF) and covariance functions (CF) of latent variables ( f (x i ), i = 1,2, . . ., n ) within a Gaussian process (GP) framework.The BF, h(x) , serve to transform the input vector into a p-dimensional feature space, and the CF assesses the strainer properties of the corresponding outputs.The GP comprises a collection of random parameters, each selected from a JGD.Mathematically, when f (x), x ∈ R d is regarded as a GP, the joint distribution of f (x 1 ) , f (x 2 ) , …, f (x n ) is Gaussian, where x 1 , x 2 , …, x n represent the input variables 23,35 .
A GP is characterized by its CF, k(x, x′) , and mean function (MF), m(x) .In other words, if we consider f (x), x ∈ R d as a GP 23,35 : The GP model, given by h(x) T β + f (x) , where f (x) ∼ GP(0, k(x, x′)) , provides a framework to model the output vector y in the following manner 23,35 : where The model exhibits a JGD, which can be described as follows: ( The CF, k(x, x′|θ ) , is characterized by a collection of kernel parameters ( θ).
The GPR model provides a versatile framework for modeling and prediction tasks, offering a wide range of HPs that greatly influence the accuracy and reliability of its outputs.The selection of these parameters and configurations is crucial and should be tailored to the specific problem, taking into account the unique specifications and requirements of the task.A comprehensive overview of these HPs can be found in Fig. 5.

Genetic algorithm
The genetic algorithm leverages a bio-inspired approach to tackle intricate optimization challenges 36 .It iteratively explores a solution landscape, mimicking the process of natural selection and evolution.This approach can be broken down into distinct phases: 1. Population seeding An initial population of candidate solutions (individuals) is generated within the defined parameter space.These individuals represent potential answers to the optimization problem.2. Fitness assessment Each individual's efficacy is evaluated using an objective function.This function quantifies how well a particular solution aligns with the desired outcome.3. Selection, recombination, and mutation A suite of operators are employed to manipulate the population and generate novel candidate solutions.Selection prioritizes more effective individuals for reproduction, recombination combines elements from selected individuals, and mutation introduces random variations to maintain population diversity.
This iterative cycle of evaluation and manipulation (steps 2 and 3) continues until a pre-defined termination criterion is reached 37 .Through these repeated steps, the GA progressively refines the population, ultimately converging on solutions with superior effectiveness as determined by the objective function.The GA's strength lies in its ability to efficiently explore a vast solution space while ensuring population diversity, thus preventing premature convergence to suboptimal solutions, a common pitfall in optimization algorithms.

Particle swarm optimization
The particle swarm optimization algorithm draws inspiration from the collective movement patterns observed in animal groups like flocks or fish schools 42,43 .These natural systems exhibit remarkable efficiency in navigating intricate environments, a concept that PSO utilizes to address optimization challenges.Unlike traditional methods prone to difficulties in non-linear and non-convex search spaces, PSO excels at finding solutions very close to the optimal value within such scenarios 44 .Its power lies in leveraging the "collective intelligence" of a population of virtual particles as they cooperatively search for optimal solutions 45 .The PSO algorithm operates through a repetitive process 42,45,46 : 1. Seeding the swarm Initially, a swarm of particles is randomly distributed within the designated search space.
Each particle possesses an associated velocity.2. Fitness assessment The effectiveness of each particle is evaluated based on the problem's objective function.

Velocity and position updates
Particles update their velocities and positions considering two crucial factors: 4. Individual memory Each particle retains a memory of its own most effective position encountered so far (representing its own learning).5. Swarm knowledge Particles are also influenced by the most effective position discovered by any particle within the swarm (representing a form of social learning).This information guides their movement towards promising regions of the search space.6. Iterative refinement The process continues iteratively as particles update their positions and velocities, gradually converging towards optimal solutions.A pre-defined stopping criterion, such as reaching a maximum number of iterations or achieving a desired level of effectiveness, determines the algorithm's termination.7. Optimal candidate identification The particle with the highest fitness value throughout the optimization process represents the PSO algorithm's final output, signifying the solution closest to the optimum.
PSO's strength lies in its ability to maintain a balance between exploration (searching new areas) and exploitation (focusing on promising regions) through the combined influence of individual and collective memory.This dynamic approach allows PSO to effectively navigate complex search spaces and identify solutions very close to the optimal value.

Marine predators algorithm
The MPA is a nature-inspired optimization algorithm that mimics the hunting behavior of marine predators, such as sharks, in an aquatic ecosystem.Faramarzi et al. 49 introduced the MPA in 2020, which has since gained recognition for its remarkable capability to efficiently explore and exploit search spaces.MPA has demonstrated its effectiveness, particularly in tackling intricate optimization problems 50 .
In order to comprehend the intricacies of this algorithm, it is crucial to delve into its foundation, which lies in the hunting behavior of marine predators.The Lévy strategy is commonly observed among marine predators, such as sharks, especially in prey-scarce environments.However, when these predators encounter areas abundant with prey, they predominantly adopt a Brownian motion pattern characterized by random movement.This switch in foraging behavior highlights their adaptability to different ecological conditions and the ability to optimize their search strategies based on prey availability 51 .The type of movement adopted by each participant, whether a predator or prey, plays a significant role in determining the most effective approach for encounters.Additionally, the relative velocities between the predator and prey further shape the encounter rate policy 52 .The following www.nature.com/scientificreports/summarization encapsulates the key governing policies that drive optimal foraging strategies, interactions, and memory formation in marine predators 49 : • Marine predators employ the Lévy strategy when navigating environments with low prey concentrations, optimizing their search patterns for efficient foraging.However, in areas abundant with prey, they predominantly switch to the Brownian movement.• Marine predators exhibit consistent proportions of Lévy and Brownian movement patterns throughout their lifetime as they traverse diverse habitats.• Marine predators exhibit changes in their behavior in response to environmental factors, including both natural phenomena such as eddy formation and anthropogenic influences like fish aggregating devices (FADs).• The optimal predator strategy in scenarios characterized by a low-velocity ratio (v = 0.1) is the Lévy movement.This holds irrespective of whether the prey exhibits Brownian or Lévy movement patterns.• In the case of a unit velocity ratio (v = 1), the optimal strategy for the predator is Brownian movement when the prey exhibits Lévy motion.• In situations characterized by a high-velocity ratio (v = 10), the optimal strategy for the predator is to remain stationary, regardless of whether the prey exhibits Brownian or Lévy movement patterns.• Marine predators leverage their advanced memory capabilities to their advantage, utilizing recollection of their conspecifics and successful foraging opportunities' locations.
Drawing upon these key observations, the optimization process of MPA can be delineated into three distinct phases.The first phase (exploration phase) pertains to high-velocity ratios, where prey outpaces the predator.The second phase (transition phase) encompasses unit velocity ratios, wherein predator and prey move at nearly equivalent speeds.Finally, the third (exploitative phase) addresses low-velocity ratios, where the predator exhibits greater velocity than the prey.In addition to the three distinct phases outlined for optimization, the inclusion of natural and anthropogenic environmental factors is crucial in the development of MPA.All these factors are mathematically modeled to incorporate their effects within the MPA framework.
An extensive investigation into the marine predators algorithm is provided in references 49,50,53,54 .These explorations delve into the theoretical principles upon which MPA is based and the intricacies involved in its practical implementation.

Models development
By systematically exploring and evaluating different HP settings, we can gain valuable insights into the impact of these parameters on the ML algorithms' performance.Indeed, finding the optimal combination of HPs that influence machine learning algorithms' training and structural aspects can be challenging and complex.Without a principled method or systematic approach, achieving this optimal combination may prove difficult, if not impossible.The interplay between different hyperparameters and their effects on the algorithm's performance can be intricate and non-intuitive.
In situations where finding the optimal set of HPs for ML algorithms is challenging, utilizing metaheuristic algorithms (GA, PSO, and MPA) can offer an intelligent solution.As depicted in Fig. 5, the present study focuses on examining the impact of 12 distinct HPs on the performance of the GPR method.These hyperparameters encompass a range of factors, including the kernel function, optimizer, predict method, active set method, active set size, distance method, computation method, regularization, standardization, sigma, basis function, and fit method.The effects of these HPs on GPR model performance can vary depending on the nature of the problem being addressed and the characteristics of the available data.
This paper undertakes an analysis to comprehensively evaluate the impact of each HP on the performance of the data-driven model.The purpose of this analysis is to gain insight into hyperparameters.The initial step focuses on identifying the most influential HPs by assessing their individual effects on the model's performance.These constitute the first group of HPs.In the second step, the less impactful HPs are added to the first group to refine the model's performance further, referred to as the second group of HPs.Considering their combined effects, this step allows for a more comprehensive exploration of the HPs space.Finally, in the last step, all 12 HPs are considered simultaneously, called the third group.This integration enables the model to leverage the collective impact of all HPs.This research's optimization and modeling processes are conducted using MATLAB software (R2021b).
This research uses the training dataset, which constitutes 80% of the entire database, to train the machine learning algorithms.During the training phase, the model learns patterns and relationships within the data to make accurate predictions.After the training phase, the remaining 20% of the dataset, known as the testing dataset, is employed to evaluate the effectiveness of the trained models.This independent dataset allows researchers to evaluate how well the models generalize to unseen data and estimate their performance in realworld scenarios.In addition, employing specific datasets for both training and testing purposes is crucial to ensure a fair and consistent comparison between different models throughout the modeling process.Also, a subset of training data is separated as validation data to reduce the risk of model overtraining.The leave-one-out cross-validation is utilized for this purpose.
A fixed number of function evaluations (NFEs) is used to ensure a fair comparison between optimization algorithms.This approach helps to standardize the evaluation process and ensures that each algorithm is given an equal opportunity to optimize and converge within the same computational budget.In the HP optimization process, the study utilizes 100, 200, and 300 function evaluations in the first, second, and third steps to guide the exploration and refinement of HP settings.Due to the random nature of optimization algorithms and the substantial influence of the initial population's quality on the final solution, the optimization process is repeated five times for each case in this study.The best solution obtained across these five runs is then reported as the final solution.
The selection of metaheuristic optimization algorithms (MOAs) was guided by two key considerations.Firstly, the GA and PSO were chosen due to their well-documented effectiveness in optimizing hyperparameters for machine learning models, especially those involving mixed-integer values.Secondly, we included the MPA as a novel promising, yet under-evaluated MOA.This inclusion aimed to assess its potential for hyperparameter optimization and contribute to its broader exploration within the field.This selection strategy facilitated a twofold benefit.It allowed for benchmarking established MOAs like GA and PSO against a novel algorithm (MPA).Additionally, it ensured a balance between exploration of diverse solution spaces and efficient refinement through the well-established search capabilities of GA and PSO.Parameter settings for the GA, PSO, and MPA algorithms can be found in Table 2. Figure 6 shows the flowchart of the present study.
Furthermore, the fitness function for the metaheuristic algorithms is defined by the mean squared error (MSE) on the testing data set.This choice ensures that the algorithms prioritize models with strong generalization capabilities, emphasizing performance on previously unseen data, which is crucial for developing reliable regression models.

Evaluation criteria
To evaluate the efficacy of GPR models, five statistical metrics are employed.These metrics consist of Willmott's Index of Agreement (I A ), Coefficient of Determination (R 2 ), Correlation Coefficient (R), Mean Absolute Percentage Error (MAPE), and Mean Squared Error (MSE) 55,56 : In the formulas, the symbol "n" represents the total number of data points.The notation " Y i,Pred " denotes the predicted values, while " Y i,Exp " represents the corresponding experimental values.
The MSE and MAPE metrics provide a measure of the error magnitude associated with the predictions.A lower value of MAPE and MSE indicates a higher level of accuracy and validity in the model's predictions.On the other hand, R, R 2 , and I A criteria produce a value between 0 and 1, with a value closer to 1 suggesting a strong agreement and validity in the predictions.
Alongside the aforementioned criteria, two additional metrics, Absolute Relative Deviation (ARD) and Relative Error (RE), are employed to graphical analyze the accuracy of outputs:

Influence of various HPs
This section aims to analyze the influence of each hyperparameter.The impact of different HPs on the model's performance will be discussed in detail.Subsequently, the HPs will be categorized into three distinct groups based on their effectiveness.Each group will be optimized separately, focusing on fine-tuning the HPs within that specific category.The HPs are classified into three groups based on their susceptibility and effectiveness.
The first group comprises the most susceptible HPs, which are highly sensitive and can significantly impact

Marine Predators Algorithm
Select desired hyperparameters Fig. 6.Flowchart of the present study.the model's performance.The second category consists of HPs, the first group plus those identified as effective though insignificant in improving the model's performance.Finally, the third category encompasses all the HPs.
The effectiveness of each hyperparameter is assessed by observing its impact on the model's performance.This evaluation is done by measuring the change in the model's performance (R 2 on the testing data) when different options or values of the hyperparameter are applied to the default model.The default model refers to a state where no specific optimization or principled selection has been applied to the HPs.In this context, the HPs are initialized with default values provided by MATLAB and listed in Table 3.The performance of the default model is evaluated and presented in Table 4.It is evident from the results that the default model, with a MAPE of 28% and an R 2 of approximately 0.86, exhibits poor performance when applied to the testing data.These values indicate that the default HP values do not yield satisfactory outputs for the given problem.This outcome underscores the necessity to optimize the HPs for the GPR models.
Figures 7 and 8 visually represent how changes in hyperparameter values impact the model outputs' performance.By analyzing the relative deviation from the R 2 of the default model, one can observe the extent to which each hyperparameter affects the model's accuracy.Additionally, the R 2 of the models under different hyperparameter values provides insights into the choices for achieving improved performance.
In order to determine the HPs of the first group, a selection is made based on their effectiveness.In this case, four of the most effective HPs are chosen.Based on the analysis of the figures, it is observed that changing the covariance (kernel) function from the default value (squared exponential) significantly impacts the testing data's accuracy.It specifies how the GP model assigns uncertainty to predictions based on the proximity of input points in the feature space.The default model R 2 value increases from 0.86 to a minimum of 0.908 (ARD squared exponential) and a maximum of 0.984 (Matérn32).The absolute relative deviation from changing the kernel function ranges from 5.43% to 14.28%.Additionally, changing the basis function can enhance the model's performance by a range of 3.6% to 10.74%.The change of sigma, which represents the initial value for the noise standard deviation of the GPR model, can have a notable effect on the model's performance.Figure 7 demonstrates that the choice of sigma values greater than 0.1 has a negligible effect on the model output.The model's outputs remain relatively stable, and the R 2 value remains close to the initial value of 0.86.However, when sigma values are less than 0.01, the model's efficiency is drastically affected.The R 2 value decreases significantly from 0.86 to 0.368.The parameter "standardize" plays a crucial role in evaluating the impact of data standardization on the model.When set to "true", it indicates that the data will undergo a standardization process.This process involves centering and scaling each dataset column based on the mean and standard deviation.In the default mode, standardization is not performed (standardize = "false").However, with standardization for the present data, there is an improvement in the accuracy of the output model (3.24%).
The second group of parameters includes the four previously mentioned effective parameters and additional parameters that, while insignificant, are still considered.These HPs are the fit method, predict method, and optimizer.The optimizer encompasses a set of optimization techniques utilized to compute model parameters.The predict method outlines the procedure for generating predictions based on the GP samples.The estimation of parameters in the GPR model is performed using the fit method.Based on the Figs.7 and 8, changing the values of the fit method, predict method, and optimizer has a relatively small impact on the model's accuracy (less than 1%).Furthermore, changing the values of HPs, such as active set method, active set size, distance method, computation method, and regularization, does not affect the precision of the base (default) model.In GPR, the active set method assigns observations to vectors during the model training.The active set size determines the number of observations included in the active set during the training process.The active set size is an integer value from 1 to the number of data points.The distance method calculates the inter-point lengths between data points when evaluating kernel functions in GP models.The computation method calculates the log-likelihood and gradient in GP models.Finally, regularization is a fundamental concept within GP models that plays a crucial role in mitigating overfitting and enhancing the model's generalization capability.It involves adding a regularization term to the log-likelihood function, encouraging smoothness and simplicity in the estimated functions.It should be noted that the effect of these HPs should not be underestimated because the different HPs can interact with each other, and changing one HP may impact the optimal values of other HPs.Hence, the third group combines the HPs of the first and second groups, along with the five HPs mentioned.Considering all these parameters makes the intricate interplay between them in the modeling process evident.

Evaluation of various categories of HPs
Table 5 shows the results of optimized models by metaheuristic algorithms for the first group of HPs, which include four critical HPs.Based on the information provided in Table 5, the three algorithms used to optimize the HPs have achieved very similar performance results.The reported values of 0.998303, 0.996608, and 0.999135 for the R, R 2 , and I A indicate the reasonable performance of all three optimization algorithms.In contrast, a subtle discrepancy becomes apparent among the models when considering the aspect of error.Specifically, the MPA-GPR model exhibits a lower MAPE when evaluated on the testing dataset, albeit with a marginal variance compared to the other two models.The subtle divergence observed in the models' responses can be attributed to the sigma value optimized through distinct algorithms.As evidenced by Table 6, the MPA algorithm yielded  www.nature.com/scientificreports/ a lower estimated sigma value despite employing identical configurations of the basis function and kernel function and standardized as the other algorithms.Figure 9 illustrates the optimization process of GPR models using three different metaheuristic algorithms for the first group of hyperparameters.The x-axis represents the number of generations or iterations, while the y-axis shows the fitness values measured by MSE on a logarithmic scale.Based on the figure, the best fitness values for GA show a significant improvement from the first generation, rapidly decreasing and stabilizing around the second generation, indicating that GA efficiently finds a near-optimal solution early in the optimization process.In contrast, the best fitness values for PSO demonstrate a favorable start, achieving a stable optimum value after the sixth iteration, reflecting a slower convergence compared to GA but reaching a comparable final fitness value.The best fitness values for MPA show a gradual improvement, taking up to the sixth iteration to converge to a stable value, indicating a slower convergence speed than both GA and PSO.The mean fitness values for GA decrease gradually across generations, showing a trend similar to the best fitness values but with higher MSE initially, indicating that GA maintains a balance between exploration and exploitation.This trend is also observed with a steeper slope for the PSO algorithm.The sudden increase in mean fitness in iterations 7-9 indicates the depth of exploration in the space of hyperparameters.For MPA, the mean fitness values show a steady decline, aligning closely with the best fitness values, suggesting that the population of solutions in MPA improves uniformly across iterations.Overall, GA is the most efficient algorithm in terms of rapid convergence and achieving low MSE for the first group of hyperparameters, with PSO also performing well but with a slower convergence rate, and MPA eventually achieving comparable performance despite its slower convergence.www.nature.com/scientificreports/A more comprehensive understanding of the performance of optimization algorithms can be attained by considering the HPs within the second group.This group encompasses seven specific HPs that undergo optimization.The outcomes of the optimal models for the testing and training data are presented in Table 7.As per the table, the models optimized with seven hyperparameters using various algorithms resemble those observed in the first group in terms of their accuracy.Nevertheless, the PSO-based model has exhibited superior performance compared to the other two algorithms in terms of key evaluation metrics such as MSE, R, R 2 , and I A .However, it is noteworthy that the PSO-GPR model demonstrates a weaker performance in terms of MAPE in comparison to the model based on the GA.According to the findings presented in Table 8, it is obvious that all three models exhibit similarity in their basic parameters, including basis function, kernel function, and standardization.However, distinctions arise in some HPs, including fit method, optimizer, and sigma, which account for the subtle differences observed in their performance.
Figure 10 presents the HP optimization process of GPR models using GA, PSO, and MPA for the second group of HPs.The best fitness values for GA show a sharp decline from the third to the fourth generation, reaching a stable minimum value, and indicating a rapid convergence to an optimal solution.The best fitness values for PSO exhibit a faster improvement, stabilizing at a lower MSE after the second iteration, suggesting a steady convergence with consistent performance across iterations.The best fitness values for MPA demonstrate a slower initial improvement, similar to the first group, but achieve a stable and comparable minimum MSE by the sixth iteration.The mean fitness values for PSO decrease progressively, mirroring the trend of the best fitness values but showing a wider range initially, indicating PSO's effective balance between exploration and exploitation.For MPA, the mean fitness values steadily decline, closely following the best fitness values, reflecting a uniform improvement in the solution population over iterations.GA's mean fitness values decrease slowly similar to PSO, showing higher MSE at the seventh iteration but eventually converging to a stable value near 0.1, indicating a Including all HPs in the optimization process allows for exploring their interactions and determining their mutual influence on the accuracy of the models.By considering the collective impact of these HPs, a comprehensive analysis can be conducted to unravel the intricate relationships and dependencies between them.The outcomes of the optimized models, incorporating 12 distinct HPs, are presented in Table 9.As per the table, the genetic algorithm has demonstrated an ability to identify a combination of HPs that significantly deviates from those observed in the other two models.The GA-GPR model demonstrates a remarkable level of compliance between its outputs and the respective targets, as indicated by the high values of R (0.999224) and R 2 (0.998449) obtained on the testing data.Furthermore, the GA-GPR model exhibits a lower error rate, with a MAPE of 4.024%, which shows 1% less error than the PSO-GPR and MPA-GPR models.The list of optimized HPs for the GA-, PSO-, and MPA-based models is presented in Table 10.Based on the table, it is obvious that all three optimization algorithms exhibit agreement with each other in only four parameters: basis function, kernel function, fit method, and computation method.This finding highlights the limited consensus among the algorithms regarding the optimal values for most HPs.Furthermore, the optimization of hyperparameters using GA, PSO, and MPA significantly altered the settings compared to the default values, enhancing the model's accuracy.Based on Table 3, the default configuration featured a ' quasinewton' optimizer, ' exact' predict and fit methods, 'random' active set method, and a 'squared exponential' kernel function with a basis function set to ' constant' .The optimized hyperparameters varied notably across the three meta-heuristic algorithms.The basis function was uniformly changed to 'none, ' and the kernel function shifted to 'ardmatern32' for all algorithms, indicating a preference for a more flexible and sophisticated kernel.For standardization, GA and PSO retained 'false,' while MPA adjusted it to 'true,' potentially improving normalization.The optimizers varied with GA using 'fminsearch, ' PSO 'lbfgs, ' and MPA 'fmincon, ' showing diverse approaches in local and global optimization techniques.The prediction method for GA and PSO switched to 'sd, ' while MPA maintained ' exact.' The active set size was reduced variably (31 for GA, 34 for PSO, and significantly to 4 for MPA), indicating different strategies in model complexity and computation.The active set method for PSO changed to ' entropy, ' enhancing sample selection diversity.Both PSO and MPA shifted the distance method to 'accurate' from 'fast, ' prioritizing precision over speed.The sigma values and regularization parameters also varied greatly, with GA and MPA increasing sigma values substantially (61.96242 and 266.4889, respectively), while PSO dramatically reduced it to 0.0001, and regularization increased for GA and PSO but slightly decreased for MPA.These tailored hyperparameter settings by each meta-heuristic algorithm resulted in a better-fitting and more accurate GPR model, demonstrating the efficacy of customized optimization in machine learning.Figure 11 demonstrates the optimization process of GPR models for the third group of HPs.According to the figure, the best fitness values for MPA show a sharp initial decline, stabilizing at a low MSE by the fourth iteration, demonstrating MPA's rapid convergence capability.PSO exhibits a more gradual reduction in MSE, achieving stability around the eighth iteration, reflecting a steady improvement and consistent performance.GA, on the other hand, shows a slower initial improvement but ultimately reaches a stable lower MSE by the sixth generation, indicating an effective convergence.The mean fitness values for GA follow a decreasing trend, though with a broader range initially, suggesting effective exploration and subsequent exploitation of the search space.MPA's mean fitness values decline steadily, closely aligning with the best fitness values, indicating a uniform enhancement in the population's overall performance.PSO's mean fitness values decrease more slowly, along with the sudden increases in the average fitness value, which is caused by the comprehensive search in the hyperparameter space.Generally, GA maintains its efficiency in gradual convergence and achieves the lowest MSE.PSO shows reliable performance with steady improvement over iterations, while MPA, despite a satisfactory start, ultimately achieves the worst optimization results.

Comparison of superior models
The regression diagrams presented in Fig. 12 illustrate the performance of the superior model across different categories of hyperparameters.Based on Fig. 12a,b, it is apparent that both the MPA-GPR and PSO-GPR models exhibit remarkably similar performance in both the testing and training phases.Upon closer examination of the regression diagrams for the MPA-GPR and PSO-GPR models, it is evident that the training data points exhibit a strong alignment with the Y = X line, indicating a high level of compliance.However, a noticeable deviation from the Y = X line is observed for some test data points, suggesting a certain level of variability in the model's predictive accuracy for unseen data.In contrast, the model based on the genetic algorithm, which optimizes all 12 parameters, demonstrates exceptional performance in the test phase.This model exhibits a distinct advantage over the MPA-GPR and PSO-GPR models, as evidenced by its superior ability to accurately predict outcomes for unseen data.This finding highlights the effectiveness of the genetic algorithm in identifying an optimal combination of hyperparameters that enhances the model's generalization capabilities.According to Fig. 12c, the model based on the GA does exhibit a slightly weaker performance in the training phase compared to the MPA-GPR and PSO-GPR models.Some training data points show a slight deviation from the Y = X line, indicating a minor discrepancy between the predicted and actual values.
In Fig. 13, a violin plot compares the outputs of the default model, the optimized models, and the actual data in the testing phase.The shape of the density function in the plot provides insights into the similarity between the model outputs and the actual data.Notably, the outputs of the GA-GPR model exhibit the highest similarity with the actual data, as evidenced by the shape of the density function.Conversely, the default model displays the largest discrepancy with the experimental data, indicating its limited accuracy in capturing the underlying patterns.The largest differences between the predicted and actual data are observed at low and high dynamic viscosity values.The optimal models' predictions in these regions are accompanied by a more significant error, suggesting that further improvements are needed to enhance their performance in these specific ranges.
Figure 14a presents the absolute relative deviation of the outputs of MPA-GPR, PSO-GPR, GA-GPR, and the base models from the actual values in the testing phase.Analyzing these values, we can observe distinct differences in the performance of the models.The MPA-GPR and PSO-GPR models exhibit similar ARDs, 0 3 6 9 12 15 18   Actual DV (mPa•s) ranging from approximately 0.8% to 21.6%.These models demonstrate moderate accuracy in predicting DV values, with some variations in performance across different data points.On the other hand, the GA-GPR model demonstrates significantly lower ARDs, ranging from approximately 0.1% to 19%.This indicates that the genetic algorithm-based model provides more accurate predictions compared to the MPA-GPR and PSO-GPR models.
The GA-GPR model's ability to optimize all 12 parameters results in improved performance and better capturing of the underlying patterns in the data.In contrast, the default (base) model shows much higher ARDs, ranging from approximately 0.8% to 183%.These large deviations suggest that the base model is inadequate in accurately predicting the viscosity values.Figure 14b presents a more detailed analysis of the absolute relative deviation of different models.The figure provides additional insights into the performance of each model by focusing on specific ranges of ARD values.The basic model exhibits a relatively high ARD, with approximately 33% of the testing data recording ARD values above 20%.This indicates that the basic model struggles to accurately predict viscosity values, particularly for a significant portion of the dataset.In contrast, both the PSO-GPR and MPA-GPR models show improved performance, with only about 11% of the data recording ARD values above 20%.This suggests that these models achieve better accuracy compared to the basic model, as they have a smaller percentage of predictions with large deviations from the actual values.The most notable improvement is observed in the GPR model based on the genetic algorithm (GA-GPR).Interestingly, none of the data records an ARD above 20% for this model, indicating its ability to provide highly accurate predictions across the testing dataset.Furthermore, the analysis reveals that approximately 11% of the data for all models fall within an ARD range of 10 to 20%.This suggests a similar level of performance in capturing viscosity values within this range for all models.One noteworthy point in Fig. 14b is the data with a small error (ARD < 1%).The GA-GPR model stands out in this aspect, with approximately 33% of the data falling into this category.In contrast, the PSO-GPR, MPA-GPR, and base models have a significantly lower percentage (around 11%) of data with such a small error.
Figure 15 provides insight into the behavior of the GA-GPR model by illustrating the relative error of its outputs in relation to the inputs.Analyzing this figure provides insights into the performance of the model under different conditions.Specifically, Fig. 15a focuses on a special case where the MXene nanomaterial is absent from the solution.Examining the relationship between temperature and RE, we can observe varying trends.Considering MPCM MF = 0 wt%, at 20 °C, the GA-GPR model produces a positive RE of 3.28%.As the temperature increases to 30 °C, the RE decreases significantly to a minimal value of 0.02%, indicating a much closer approximation to the actual values.However, at 40 °C and 50 °C, the RE remains close to zero, suggesting a relatively accurate prediction.Interestingly, at 60 °C, the model exhibits a negative RE of − 19.09%, indicating an overestimation of the viscosity values.When the MPCM dosage is increased to 5 wt%, the highest relative error is observed at 50 °C.The model exhibits a relative error of 8.8% at this temperature, indicating the most significant positive deviation from the actual values.This suggests that the GA-GPR model underestimates viscosity values in this particular range.However, when the MPCM mass fraction is set to 10 wt%, the relative error remains around or below 1%.This indicates a higher level of accuracy and suggests that the model performs well in predicting viscosity values in this dosage range.Interestingly, when the MPCM mass fraction is further increased to 15 wt%, a reasonable percentage error is only observed at low temperatures, specifically between 20 and 30 °C. Figure 15b highlights the relative error of the superior model under a specific condition where a constant dose of MPCM (10 wt%) is present.In this scenario, the model error remains within acceptable limits for most problem conditions, with a relative error below 1%.However, a notable exception occurs when the lowest temperature (20 °C) is combined with the maximum MF of MXene (0.5 wt%).In this case, the model shows a RE = 3%, indicating a slightly larger deviation from the experimental values.
Figure 16 presents the dynamic viscosity variation of the suspension that comprises microencapsulated PCM and MXene, as influenced by the MPCM MF, MXene MF, and temperature.These insights are derived from the GA-GPR model.According to the figure, a decrease in temperature leads to an increase in the DV of suspension.The severity of this phenomenon can be attributed to the behavior of the suspending medium and the particles within it.At lower temperatures, the kinetic energy of the particles decreases, causing them to move more slowly.As a result, the particles tend to interact and come closer together.This increased interaction and reduced particle On the other hand, increasing the concentration of MPCM causes a substantial increase in viscosity.As the concentration of MPCM increases, more particles are dispersed throughout the suspension.These particles have a certain size and shape, which affects their interaction with the suspending medium and other particles.When the concentration of MPCM is low, the particles are relatively dispersed and spaced apart within the suspension.This allows the suspending medium to flow more freely between the particles, resulting in lower resistance to flow and lower dynamic viscosity.However, as the concentration of MPCM increases, the particles become more closely packed together.This reduces the space available for the suspending medium to flow between the particles, leading to increased frictional forces and resistance to flow.Consequently, the dynamic viscosity of the suspension increases significantly.According to the figure, the effect of MXene NMs on viscosity is relatively tiny.However, it is essential to note that MXene have a noteworthy influence on other thermophysical properties of the suspension, such as thermal conductivity.

Conclusion
This study uses Gaussian process regression to accurately predict the dynamic viscosity of suspensions containing microencapsulated PCM and MXene nanomaterials.Twelve hyperparameters of GPR are analyzed separately and classified into three groups based on their importance.Three metaheuristic algorithms, namely GA, PSO, and MPA, are employed to optimize these HPs.The developed models provide engineers with a deep understanding of dynamic viscosity in pure and nano-enhanced MPCM suspensions.This research offers a tool for optimizing machine learning methods with numerous hyperparameters.These advancements significantly affect efficiency in various industries, from thermal energy storage to thermal management systems.Moreover, the models reduce experimental and computational costs, providing cost-effective and efficient solutions.The most important findings of this study are as follows: • Twelve HPs of GPR were investigated.Among these, the covariance function, basis function, standardization, and sigma were found to have the most significant impact on the GPR model.These HPs were optimized as part of the first group.• The second group of HPs incorporated the four primary HPs along with supplementary parameters that had a minimal impact (less than 1%) on the model's accuracy.These secondary HPs encompassed the fit method, predict method, and optimizer.• Altering the values of certain HPs (active set method, active set size, distance method, computation method, and regularization) did not impact the accuracy of the default model.• To explore the interactions between HPs, the third group was created, encompassing all of the HPs.
• The GA, PSO, and MPA algorithms achieved similar performance results when optimizing the first group of HPs.The R, R 2 , and I A values (0.998303, 0.996608, and 0.999135) indicated reasonable performance for all three algorithms.However, when considering errors, the MPA-GPR model showed a slightly lower MAPE on the testing dataset.• The models optimized with seven HPs in the second group, using various algorithms, showed similar accuracy to those in the first group.However, the PSO-based model outperformed the other two models in key evaluation metrics such as MSE, R, R 2 , and I A .• The optimized models, involving 12 different HPs, revealed that the genetic algorithm successfully identified a unique combination of HPs.The GA-GPR model exhibited excellent compliance with the targets, with high values of R (0.999224) and R 2 (0.998449) achieved on the testing data.• The statistical criteria, regression charts, violin plots, and absolute relative deviation charts revealed the significant superiority of the optimized model from the third group (12 HPs) compared to other HP groups.This highlights the importance of mutual interactions among HPs.
This study presents a valuable tool for engineers and decision-makers in industries reliant on microencapsulated phase change materials.Suspensions containing MCPMs and their characteristics are fundamental for optimizing processes and product performance in TES systems, construction, thermal management, and electronics.The proposed methodology of hyperparameter optimization for machine learning models empowers industries to achieve significant improvements in these fields.This translates into concrete advantages, including reduced laboratory experimentation costs, enhanced decision-making in material selection and process design, ultimately leading to optimized processes across a diverse range of applications.The efficacy of this approach is further substantiated by the success of metaheuristic algorithms in identifying optimal hyperparameter combinations, resulting in highly accurate models.

Fig. 1 .
Fig. 1.Concise overview of the modeling process employed in the present study.

Fig. 7 .a t e r n 5 2 a r d r a t i o n a l q u a d a r d s q u a r e d e x p e x p o n e n t i a l m a t e r n 3 2 m a t e r n 5 2 r a t i o n a l q u a d s q u a r e d e xa t e r n 5 2 a r d r a t i o n a l q u a d a r d s q u a r e d e x p e x p o n e n t i a l m a t e r n 3 2 m a t e r n 5 2 r a t i o n a l q u a d s q u a r e d e x pFig. 8 .
Fig. 7.The effect of different values of basis function, active set method, active set size, predict method, optimizer, and sigma on default model performance.

Fig. 9 .
Fig. 9.The best fitness and mean fitness values as the generation/iteration progress for different metaheuristic algorithms for the first group (4 HPs).

Fig. 13 .
Fig. 13.Comparison of the violin plots of the outputs of default and optimized models.

Fig. 14 .
Fig. 14.ARD of testing data points for various models.

Fig. 16 .
Fig. 16.Dynamic viscosity outputs of GA-GPR model for suspension containing MPCM and MXene under the influence of MPCM MF, MXene MF and Temperature.

Table 1 .
Statistical characteristics of experimental data.

Exp 2 Table 2 .
Parameters settings for the GA, PSO, and MPA algorithms.

End No Yes Enter Exp Data Train GPR Performance evaluation using testing data Start Select desired hyperparameters Select the best set of hyperparameters Define the search space Hyperparameters Optimization Kernel Function Basis Function Fit method Regularization Standardize data Sigma Value Computation Method Distance Method Active Set Size Active Set Method Predict Method Optimizer Objective Function Evaluation For Each Particle Update Velocities and Positions Objective Function Evaluation Update pbest and gbest Fitness Evaluation Selection Initial Population Crossover Mutation Initial Population Update predator's position Prey detection Social interaction Fitness Evaluation Update predator's fitness Initial Swarm Hyperparameters Optimization (GA, PSO, MPA) Genetic Algorithm
Exp × 100

Table 3 .
Default values of HPs.

Table 4 .
Performance of default model on testing and training datasets.

Table 5 .
Performance of optimized models (4 HPs) on testing and training datasets.The superior optimization technique is in bold.

Table 8 .
Values of optimized HPs for second group (7 HPs).The best fitness and mean fitness values as the generation/iteration progress for different metaheuristic algorithms for the second group (7 HPs).

Table 9 .
Performance Overall, this figure indicates that PSO remains the most efficient algorithm in terms of fast convergence and achieving low MSE, with GA and MPA also performing well but with different convergence rates.
of optimized models (12 HPs) on testing and training datasets.The superior optimization technique is in bold.

Table 10 .
Values of optimized HPs for third group(12 HPs).The best fitness and mean fitness values as the generation/iteration progress for different metaheuristic algorithms for the third group (12 HPs).